AITopics | observable domain

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Neural Information Processing SystemsDec-24-2025, 11:47:32 GMT

A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and overall performance in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.

latent state, name change, prediction and planning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Reinforcement Learning under State and Outcome Uncertainty: A Foundational Distributional Perspective

Preuett, Larry III, Zhang, Qiuyi, Ahmad, Muhammad Aurangzeb

arXiv.org Artificial IntelligenceJul-8-2025

In many real-world planning tasks, agents must tackle uncertainty about the environment's state and variability in the outcomes of any chosen policy. We address both forms of uncertainty as a first step toward safer algorithms in partially observable settings. Specifically, we extend Distributional Reinforcement Learning (DistRL)--which models the entire return distribution for fully observable domains--to Partially Observable Markov Decision Processes (POMDPs), allowing an agent to learn the distribution of returns for each conditional plan. Concretely, we introduce new distributional Bellman operators for partial observability and prove their convergence under the supremum p-Wasserstein metric. We also propose a finite representation of these return distributions via ψ -vectors, generalizing the classical α -vectors in POMDP solvers. Building on this, we develop Distributional Point-Based V alue Iteration (DPBVI), which integrates ψ -vectors into a standard point-based backup procedure-- bridging DistRL and POMDP planning . By tracking return distributions, DPBVI lays the foundation for future risk-sensitive control in domains where rare, high-impact events must be carefully managed. We provide source code to foster further research in robust decision-making under partial observability.

artificial intelligence, machine learning, return distribution, (16 more...)

arXiv.org Artificial Intelligence

2505.06518

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

RecBayes: Recurrent Bayesian Ad Hoc Teamwork in Large Partially Observable Domains

Ribeiro, João G., Oren, Yaniv, Sardinha, Alberto, Spaan, Matthijs, Melo, Francisco S.

arXiv.org Artificial IntelligenceJun-23-2025

This paper proposes RecBayes, a novel approach for ad hoc teamwork under partial observability, a setting where agents are deployed on-the-fly to environments where pre-existing teams operate, that never requires, at any stage, access to the states of the environment or the actions of its teammates. We show that by relying on a recurrent Bayesian classifier trained using past experiences, an ad hoc agent is effectively able to identify known teams and tasks being performed from observations alone. Unlike recent approaches such as PO-GPL (Gu et al., 2021) and FEAT (Rahman et al., 2023), that require at some stage fully observable states of the environment, actions of teammates, or both, or approaches such as ATPO (Ribeiro et al., 2023) that require the environments to be small enough to be tabularly modelled (Ribeiro et al., 2023), in their work up to 4.8K states and 1.7K observations, we show RecBayes is both able to handle arbitrarily large spaces while never relying on either states and teammates' actions. Our results in benchmark domains from the multi-agent systems literature, adapted for partial observability and scaled up to 1M states and 2^125 observations, show that RecBayes is effective at identifying known teams and tasks being performed from partial observations alone, and as a result, is able to assist the teams in solving the tasks effectively.

artificial intelligence, observable domain, recurrent bayesian ad hoc teamwork, (1 more...)

arXiv.org Artificial Intelligence

2506.15756

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Neural Information Processing SystemsJan-17-2025, 08:09:17 GMT

A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated L_0 Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the L_0 norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks.

latent state, observable domain, prediction and planning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

Online Learning and Planning in Partially Observable Domains without Prior Knowledge

Liu, Yunlong, Zheng, Jianyang

arXiv.org Artificial IntelligenceJun-11-2019

How an agent can act optimally in stochastic, partially observable domains is a challenge problem, the standard approach to address this issue is to learn the domain model firstly and then based on the learned model to find the (near) optimal policy. However, offline learning the model often needs to store the entire training data and cannot utilize the data generated in the planning phase. Furthermore, current research usually assumes the learned model is accurate or presupposes knowledge of the nature of the unobservable part of the world. In this paper, for systems with discrete settings, with the benefits of Predictive State Representations~(PSRs), a model-based planning approach is proposed where the learning and planning phases can both be executed online and no prior knowledge of the underlying system is required. Experimental results show compared to the state-of-the-art approaches, our algorithm achieved a high level of performance with no prior knowledge provided, along with theoretical advantages of PSRs. Source code is available at https://github.com/DMU-XMU/PSR-MCTS-Online.

artificial intelligence, machine learning, online learning and planning, (14 more...)

arXiv.org Artificial Intelligence

1906.0513

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games (0.68)
Education > Educational Setting > Online (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Add feedback

Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains

Doshi-Velez, Finale (Massachusetts Institute of Technology)

AAAI ConferencesJul-12-2010

The objective of my doctoral research is bring together two fields: partially-observable reinforcement learning (PORL) and non-parametric Bayesian statistics (NPB) to address issues of statistical modeling and decision-making in complex, real-world domains.

machine learning, reinforcement, reinforcement learning, (15 more...)

AAAI Conferences

Fifteenth AAAI/SIGART Doctoral Consortium

Country:

North America > United States > Massachusetts (0.05)
Asia > Middle East > Jordan (0.05)

Industry: Health & Medicine (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Automated Hierarchy Discovery for Planning in Partially Observable Environments

Charlin, Laurent, Poupart, Pascal, Shioda, Romy

Neural Information Processing SystemsDec-31-2007

Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems. Several approaches have been proposed to optimize a policy that decomposes according to a hierarchy specified a priori. In this paper, we investigate the problem of automatically discovering the hierarchy. More precisely, we frame the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general nonlinear solvers, a mixed-integer nonlinear approximation or a form of bounded hierarchical policy iteration. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters) and often easier to understand given the decomposition induced by the hierarchy.

controller, hierarchy, node, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback

Automated Hierarchy Discovery for Planning in Partially Observable Environments

Charlin, Laurent, Poupart, Pascal, Shioda, Romy

Neural Information Processing SystemsDec-31-2007

Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems. Several approaches have been proposed to optimize a policy that decomposes according to a hierarchy specified a priori. In this paper, we investigate the problem of automatically discovering the hierarchy. More precisely, we frame the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general nonlinear solvers, a mixed-integer nonlinear approximation or a form of bounded hierarchical policy iteration. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters) and often easier to understand given the decomposition induced by the hierarchy.

controller, hierarchy, node, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback

Automated Hierarchy Discovery for Planning in Partially Observable Environments

Charlin, Laurent, Poupart, Pascal, Shioda, Romy

Neural Information Processing SystemsDec-31-2007

Planning in partially observable domains is a notoriously difficult problem. However, inmany real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems. Several approaches have been proposed to optimize a policy that decomposes according to a hierarchy specified a priori. In this paper, we investigate the problem of automatically discovering the hierarchy. More precisely, we frame the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general nonlinear solvers, a mixed-integer nonlinear approximation or a form of bounded hierarchical policyiteration. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters) and often easier to understand given the decomposition induced by the hierarchy.

controller, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback

Filters

Collaborating Authors

observable domain

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Reinforcement Learning under State and Outcome Uncertainty: A Foundational Distributional Perspective

RecBayes: Recurrent Bayesian Ad Hoc Teamwork in Large Partially Observable Domains

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Online Learning and Planning in Partially Observable Domains without Prior Knowledge

Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains

Automated Hierarchy Discovery for Planning in Partially Observable Environments

Automated Hierarchy Discovery for Planning in Partially Observable Environments

Automated Hierarchy Discovery for Planning in Partially Observable Environments